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A. Embedded Coding 

An ernbedded code represents a sequence of binary de- 
cisions thai distinguish an image from the "null/ 1 or all 
gray, image. Since, the embedded code captains all lower 
rate codes ''embedded** at the beginning of the bit stream, 
effectively, the bits are "ordered in importance. 1 ' Using 
an embedded code, an encoder can terminate Ihe encoding 
at any point thereby allowing a target rate or distortion 
metric to he met exactly. Typically, some target param- 
eter, such as bit count. Is monitored in the encoding pro- 
cess. When the target is met, the encoding simply scops. 
Similarly, given a bat Stream, the decoder can Cease de- 
coding ai any point and can produce reconstructions cor- 
responding to all lower-rate encodings. 

Embedded coding is similar in spirit to binary finite- 
precision representations of real numbers. All real num- 
bers can be represented by a String of binary digits. For 
each digit added to the right, more precision is added. 
Yet, the "encoding'* can cease at any time and provide 
the "best" representation of the real number achievable 
within the framework of the binary digit representation. 
Similarly, the embedded coder can Ccafie at any time and 
provide the "best" representation of an image achievable 
within in ftamcworlc. 

The embedded coding scheme presented hero was mo- 
tivated in pan by universal coding schemes that have been 
used for lossless data compression in which the coder at- 
tempts to optimally encode a source using no prior knowl- 
edge of the source. An excellent review of universal cod- 
ing can be found m 13]. In universal coders, the encoder 
must learn the source statistics as it progresses. In other 
words, the source model is incorporated mto die actual bit 
stream. For lossy compression, there has been little work 
in universal coding. Typical image coders require exien- 
sive training for both quantization (both scalar and vector) 
and generation of nonsdaptive entropy codes, such as 
Huffman codes. The embedded coder described in (his pa- 
per anernpts to be universal by incorporating aH learning 
into ihe bit Stream itself. This is accomplished by the ca- 
ChlStvc use of adaptive arithmetic coding. 

Intuitively, for a given rate or distortion, a nonembed- 
ded code should be more efficient than an embedded code, 
since U is free front the constraints imposed by embed- 
ding. In their theoretical work [9], Equitz and Cover 
proved that a successively refinable da^criptian can- only 
be optimal if the source possesses certain Markovian 
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allow the successful 
coefficients across scales 
as part of exponentially 



ion which provides a com- 
r^Kesencaiion of the significant 
tes the embedding algorithm, 
whereby lhc ordering of im- 
in order, by the precision r 
location or the wavelet 
particular, that larger coefiV 
important than smaller coef- 
lcir scale. 

riihmetic coding which pro- 
nt meihod for entropy coding 
requires no training or pre- 
arfhmetie coder used in the ex- 
version of that in [31]. 
sc }uentially and stops whenever 
distortion is met. A target 
ctly. and an opera tionai rale- 
can be computed porat- 



wavelet theory and multi- 
an elegant methodology for 
'anomalies" on a statistically 
in image process mg be* 
nought of as anomalies in the 
;Xtremely important Lnforma- 
rey are represented in only a 
tuples. Section 111 introduces 
Shows how zerotrec coding 
sig ificancc map of wavelet cocf- 
of significant informa- 
IV discusses how successive 
is used in conjunction with 
Coding La achieve efficient 
follows on the protocol 
order the bits in order of im- 
is that the definition of im- 
ordering information is based 
ertainty J mentals as seen from 
[ecoder can figure out. Thus, 



there is no additional Overhead to transmit this ordering 
information. Section V consists of a simple 2 x 8 ex- 
ample illustrating the various points of the EZW algo- 
rithm. Section VI discusses experimental results for var- 
ious rates and for various standard test images. A 
surprising result is that using the EZW algorithm, termi- 
nating the encoding at an arbitrary point in the rntxwimg 
process does not produce any artifacts thai would indicate 
where in the picture the termination occurs. The paper 
concludes with Section VII. 

il. Wavelet Theory and Moltiresouution 
Am a lysis 

A. Trends and Anomalies 

One of the oldest problems in statistics and signal pro- 
cessing is how to choose the size of an analysis window, 
block size, or record length of data so thai statistics com* 
puled within that window provide good models of the sig- 
nal behavior within that window. The choice of art anal- 
ysis window involves trading the ability to analyze 
"anomalies," 1 or signal behavior that is more localized in 
the rime or space domain and tends to be wide band in the 
frequency domain, from "trends/* or signal behavior that 
is more localized in frequency but persists over a large 
number of lags in lhc time domain. To model data as being 
generated by random processes so that computed statistics 
become meaningful, stationary and crgodic as sumptions 
>re usually required which tend to obscure the contribu- 
tion of anornalics. 

The main contribution of wavelet theory and tnultires- 
olution analysis is tbat it provides an elegant framework 
in which both anomalies and trends can be analyzed on 
an equal footing. Wavelets provide a signal representation 
in which some of the coefficients represent long data lags 
corresponding to a narrow band, low frequency range, and 
some of the coefficients represent short data lags corre- 
sponding to a wide band, high Frequency range. Using the 
concept of scale* dan representing a continuous tradeoff 
between lime (or space in the case of images) and fre- 
quency is available. 

For an introduction lO the theory behind wavelets and 
mutli resolution analysis, the reader is referred to several 
excellent tutorials on the subject [6], f7], (17}, LIS], f20j, 
P6J. 12-7J. 

B. Relevance to Image €Zodimg 

In image processing, most of the image area typically 
represents spatial ''rreitdf.'* or areas of high statistical 
spatial correlation. However ' 'anomalies, * * such as edges 
or object boundaries. Cake on a perceptual significance that 
is far greater than their numerical energy contribution to 
an Image. Traditional transform coders, such as those us- 
ing the DCT, decompose images into a representation ia 
which each coefficient corresponds to a fixed size spatial 
area and a fixed frequency bandwidth, where the band- 
width and spatial area are effectively the same for all coef- 
ficients in the representation. Edge information tends to 
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C. A Discrete Wavelet 
The discrete wavelet 
identical to a hierarchical 
bands are 
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position, the image is 
critically subsampled as 
represents a spatial area 
a 2 x 2 area of the orij 
represent a bandwidth a] 
< |u| < sr/2, whereas 
band from x/2 < M < 
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scale wavelet coefficients 
or wavelet coefficients, 
posed and critically 
cess continues until son* 
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in frequency and rcprc- 
To begin the decorn- 
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in Fig. 1. Each coefficient 
ttrrespondmg to approximately 
image. The low frequencies 
iroximatcry corresponding to 0 
e high frequencies represent the 
r> Th« four subbands arise from 
rtical and horixont** fitters. The 
,» rod HHi represeni the finest 
To obcaln the neat coarser scale 
subband LLi is further decom- 
as shown in Fig. 2. The pro- 
final scale is reached. Note that 
coefficients represent a larger 
t a narrower band of frcquea- 
are three subbands; the remain- 
is a representation of the 
scales. The issues involved in 
Tor the type of subband decom- 
havc been discussed by many 
in this paper. Interested read- 
|. [32], (35), in addition to rcf- 
iographics of the tuLorial papers 
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Fig. 1. A I va-scalc vavclci 4 CMfiavsitkMi: TV ima*c is dviatot tafto few 
yibbub alia* SCpamMc fillOV Ejcb EDcfftckflt i» ihe mhfea m fa iL,. 

«mi MHi zcpcEXDls ft ir«ti»l m comspoodiRc » ArtMuIoutely a 4 
X 4 m oTOk oriiinW pkwi*. The low Cxqucboci al this sOrfc iCpKWM 
a MAdwkilb apprnximaicly cooeapow^ Ui 0 < M < wbe»»lb> 
hasn rTOocki ttptrtrm iha tad r**> »/4 <H< */2. 

be a column vector whose elements are the array of coef- 
ficients resulting from the wavelet irsnsfomi or subband 
decomposition applied to x. From the transform point of 
view, X represent! a linear transformation of * repre- 
sented by the matrix W t i.e. . 

X = Wx. (1) 

Although not acroally computed this way, the effective 
filters that gfteraie the subband signals 'from the original 
signal form basis functions for the transformation, i.e., 
the rows or W. Different coefficients in the same subband 
represent the projection of the entire image onto translates 
of a prototype subband filter, since from the subband point 
or view, they arc simply regularly spaced differcm oumuta 
of a convolution between the image and a subband filter. 
Thus, the basis functions for each coefficient in a given 
subband are simply translates af one another. 

In subband coding systems [32], ihc coefficients from 
a given subband are usually grouped together lor the pur- 
poses of designing quantizers and coders. Such a group- 
ing suggests that statistics computed from a subband are 
in some sense representative of the samples in thai sub- 
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hand. However ihu stati: cical grouping once again bit- 
plicitty de-emphasizes thi outliers, which tend Co rcpre- 
leai the most significant *i xoaJres or edges. In this paper, 
ihc term " wavelet transfoj n' ' is used because each wave- 
lcl coefficient is individlls ly and dctejmuiisttcalry eonv 
pared to the same set of thresholds for the purpose of 
measuring significance. T tux, each coefficient is treated 
as a distinct, potentially i oportant piece of data regard- 
less of to scale, and oo su listies for a whole subband are 
used in any form. The res ill is that the small number of 
"detcrainisrically" signi/ cam fine scale coefficients are 
not obscured because of lb- if "statistical" insignificance. 

The niters used to eemj ite the discrete wavelet trans- 
form In ibe coding experm mts described in this paper are 
based on the 9- Up symrr. ilric quadrature mirror filters 
(QMF) whose coefficients ire given in This transfer 
■nation has also been calfec a QMF-pyramid. These filters 
were chosen because in ao> ition to their good localization 
properties, their symmetry allows for simple edge treat- 
ments, and they produce g tod results empirically. Addi- 
tionally, using properly sc Jed coefficients, the transfor- 
mation matrix for a discre c wavelet transform obtained 
using these filler? is so c vc to Unitary that it can be 
ueaied as unitary for the J arpose of lossy Compression. 
Since unitary transforms rreserve £3 norms, it makes 
sense from a numerical its id point to compare all of the 
resulting transform coefficients to the same thresholds lo 
assess sjgnifi 




HI. ZEaorxJecs OP ^avelct Coefficients 
In this section, an impor int aspect of low bit race im- 
age coding Is discussed: tl e coding of the positions of 
those coefficients that will 1 e transmitted as nonzero val 
ves. Using scalar quantizat on followed by entropy cod- 
ing, in order to achieve vex ' low bit rales, i.e., less than 
1 bit/pel. the probability o the most likely symbol after 
quantization— ihe zero sym ol— must be extremely high. 
Typically, a Urge fraction c the bit budget must be spent 
on encoding the significant map, or the binary decision 
as to whether a sample, in 1 is case a coefficient of a 2-D 
discrete wavelet transform, has a zero or nonzero quan- 
tized value. It follows thai l significant improvement in 
encoding the significance tap translate* into a corre- 
sponding gain in compress^ n efficiency 

A. Significance Map Encad yg 

To appreciate the import nee of significance map en- 
Coding, consider a typical tr nsfonn coding system where 
a decerrelating Lransformati n is followed by an cniropy- 
ooded scalar quantizer. Tht following discussion Is not 
intended to be a rigorous jus fixation for significance map 
encoding, bm merely to put ride the reader with a sense 
of the relative coding costs of the position informatioe 
contained in the significant ► map relative to amplitude 
aTtd sign information- 

A typical low-bit rale irzule coder has three basic com- 
ponents; a transformation, a quantiser and data compres- 



Fig. 3. a generic Oawronn cwfc*. 

sion r as shown in Fig. 3. The original image is passed 
through some transformation 10 produce transform coef- 
ficient*. This transformation b considered 10 be lossless, 
although in practice this may not be the case exactly. The 
transform coefficients are then quantized to produce a 
Stream of symbols, each of which corresponds 10 an index 
of a particular quantization bin. Note thai virtually all of 
the information loss occurs in ibe quantization stage. The 
data compression stage tajces the stream of symbols and 
attempts to losslessly represent tfae data stream as effi- 
ciently as possible. 

The goal of the transformation is to produce coefficients 
that are dceorrelated. ff we could, we would ideally like 
a transformation to remove all dependencies between 
samples. Assume for the moment thai the transformation 
is doing its job so well that the resulting transform coef- 
ficients are not merely UhConclaLed, but statistically in- 
dependent. Also, assume thaL we have removed the mean 
and coded it separately so that the transform coefficients 
can be modeled as zero-mean, independent, although pe* 
haps not identically distributed random variables. Fur- 
thermore, we might additionally constrain the model so 
that the probability density functions (PDF) for the coef- 
ficients are symmetric. 

The goal is to quantize the transform coefficients so that 
the entropy of the resulting distribution of bin indexes is 
small enough so that the symbols can be entropy-Coded at 
some target low bit zate. say for example 0.5 bits per pixel 
<bpp.). Assume that the quantizers will be symmetric 
midtread, perhaps nonuniform, quantizers, although dif- 
ferent symmetric midtread quantizers may, be used for dif- 
ferent groups of transform coefficients. Letting the central 
bin be index 0, note that because of the symmetry, for a 
bin with a nonzero index magnitude, a positive or nega- 
tive index is equally likely. In other words, for each non- 
zero index encoded, the entropy code is going to require 
at least one-bit for the sign. An entropy code can be de- 
signed based on modeling probabilities of bin indices as 
the fraction of coefficients in which the absolute value of 
a particular bin index occurs. Using this simple model, 
and assuming thai the resulting symbols arc independent, 
the entropy of the symbols H can be expressed as 

H — —p lag, p — (1 — p) |ogj (1 - p) 

+ 0 - /OH + JfaL (2) 

where p is the probability that a transform coefficient is 
quantized to zcto, and H M2 represents the conditional en- 
tropy of the absolute values of the quantized coefficients 
conditioned on them being nonzero. The first two terms 
in the sum represent the first K>rrfer binary entropy of the 
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significance map. 
rrvrfiriftwal entropy of th 
multiplied by Che prufaebil ty 
we can express the true 
bote as follows: 



Total Cost = Co: of Significance Map 

( lost of Nonzero Values. £3) 



Returning to the model, 

0. 5. What is the minimum 
Consider the case where 

1. e. Hhz = 0. Solving 
(he probability of zero 
lion 

1a this Case, under the 
the coefficient must be 
8396 Of Che hit budget is 
map. Consider a more 
the minimum probability 



suppose that the target is /T = 
probability of zero achievable? 
re only use a 3-level quantizer, 
provides a lower bound on 
the independence aasuinp- 



fi r p 



g ven 



H m 0.5) ■- 0,916. (4) 

nloei ideal conditions, 91.6ft of 
uanrized to zero. Furthermore, 
i sod in encoding the significance 
t> >ie*l example where ¥i MZ — 4, 
if zero is 



In this case, the pfObabil|y 
the cost of encoding the 
the cost- 
As the target rate deer 
creases, and the fraction 
to the significance map 
pendcoce assumption is 
art: often additional 
can be exploited to 
significance map. 
ma Qcr how optimal the 
coder, under very typica 
wiifiiw^ the positions of 
represents a significant 
rates, and Is likely to 
total cost as the rate 
ploying an image model 
and easy to Satisfy 
significance maps of 
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B. Compression oj 
Wavelet Coefficient 

To improve the 
wavelcL coefficients, a ne* 
is defined. A wavelet 
cam with respect to a 
aei o u e e is based on the 
Acicnt at a coarse scale 
given threshold T, then 
orientation in the same 
likely to be insignificant t 
deoce suggests that this 

More specifically, in 
with the exception of thi 



the third term represents the 
distribution of nonzero values 
of them being nonzero. Thus, 
: of encoding the actual sym- 



M = OS) - 0.954. 



(5) 



of zero must increase, while 
ignificance map is still 54% of 

iscs, the probability of zero in* 
of the encoding cost attributed 
Of course* the indc- 
t irealisric and in practice, there 
between coefficients thai 
the cost oF encoding the 
the conclusion is that no 
ansforrn, quantizer or entropy 
conditions, the cost of deter- 
he few significant coefficients 
of the bit budget at low 
an increasing fraction of the 
i. As Will be seen, by em- 
based on an extremely simple 
we can efficiently encode 
coefficients. 
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of significance maps of 
data Structure called a teroirte 
X is said to be instgmfi- 
ihreshold T if \x\ < T. The 
hjpo thesis that if a wavelet oOef- 
tnttgnificant with respect lo a 
coefficients of the same 
location at finer scales are 
Jith respect to 7*. Empirical evi- 
faf pothesis is often true. 

hierarchical subband system, 
highest frequency subbands. 
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every coefficient at a given scale can be related to a Set of 
coefficients at the next finer scale of similar oriemaiioo. 
The coefficient at the coarse scale is called the parent, and 
ill coefficie n ts corresponding to the same spatial location 
at the next finer scale of similar orientation are called chil- 
dren. For a given parent, the set of all coefficients at all 
finer scales of similar orientation corresponding to the 
same location are called descendants. Similarly, for a 
given child, the set of coefficients at all coarser scales of 
similar orientation corresponding to the same location are 
called ancestors. For a QMF-pyramid subband decom- 
position, the parent-child dependencies arc shown in Fig. 
4. A wavelet tree descending from a coefficient in *nb- 
band HH3 is also seen in Fig. 4. With die exception of 
the lowest frequency subband. all parents have four chil- 
dren. For the lowest frequency subband, the parent-child 
relationship is defined such that each parent nede has three 
children. 

A scanning of the coefficients is performed in such a 
way that no child node is scanned before its parent. For 
an //-scale transform, the scan begins ax the lowest fre- 
quency subband, denoted as and scans subbands 
BX~ti, /J**, and J//Vjy, at which point it moves on to scale 
N — 1, etc. The scanning pattern for a 3-scade QMF-pyi* 
amid can be seen in Fig. 5. Note that each coefficient 
wrdlin a given subband is scanned before any coefficient 
in the next subband. 

Given a threshold level T to determine whether or not 
s coefficient is significant, a coefficient x U said to be an 
element of a ttrvxree for threshold T if itself and all of its 
descendants are insignificant with respect to 7. An de- 
ment of a zeroirec for threshold 7 is a zerotrcc root if it 
is not the descendant of a previously found zerotree root 
for threshold T, i.e., it is not predictably insignificant 
from the discovery of a zerotree tool at a coarser scale at 
the same threshold- A zcfOLrce root is encoded with a spe- 
cial symbol indicating that the insignificance of the coef- 
ficients at finer scales h completely predictable. The tig* 
nificance map can be efficiently represented as a string of 
symbols from a 3 -symbol alphabet which is Chen entropy 
coded. The three symbols nsed are 1) zciocree root, 2) 
isolated zero, which means that the coefficient is insignif- 
icant, but has some significant descendant, and 3) signifi- 
cant. When encoding the finest scale coefficients, since 
coefficients have no children, the symbols in the string 
come from a 2-symbol alphabet, whereby the zeroirec 
symbol ii not used. 

As will be seen in Section JV, in addition to encoding 
the significance map, it is useful to encode the sign of 
significant coefficients along with the significance map. 
Thus, in practice, four symbols are used: 1) zerotree root, 
2) isolated zero, 3) positive significant, and 4) nc^live 
significant. This minor addition will be useful Tor embed- 
ding. The flow chart for the decisions made at each coef- 
ficient are shown in Fig. 6. 

Note that it is also possible to include two additional 
symbols such as "positive/negative significant, but des- 
cendants are zeroirecs" etc. Id practice, it was found that 



08/19/2005 FRI 01:28 [TX/RI NO 9323] (g|023 



2Q0b$ wm iwm 



KKUH LtliAL UiV. 



NO. zoi / c 



lm 7 \jsrxr, 



Fuj. a. Vuu^eliMdcpcndcBcia o 
f*©m (he nbtand of Ac patoils lo th 
rrcqscocy safabaad it lb« top left. *n 
ibe booom right. Also shown jj , », 
Mcn dc i Us of a JU»£>c cocOoal ia 
b * Krtte root if Hi* ifisixmi Ram 



tag Eg 




If L M 




EH, 




MB* 



Fig. 3. i 



at low bit rates, this additioi 
coding the gignificance map; 
consider tbit there is ■ cost 
the set of positive <or negati 
those whose descendents ire 
nificant descendants. IT die c 

but the cost of encoding a 

then it is more efficient to cod 
araicly than to use additional 

Zerotree coding reduces 
nificancc map using sclf-simi 
age bat been transformed usdi B 
the occurrences of inslgmfica it 
pendent events. More trad 
Uansfoxm coding typically 
some form of mo-length en 
tree symbol, which is a single 
applies to all cjte^Iepths. run 
symbol for each run-length v 
technique that is closer in spir. 
of-bJock (EOB) symbol used 
a "terminating'' symbol in^ 
OCT coefficients in the bloci 



tetP TmyvNIACTlOwi OM SIGNAL MOCEiSJNC. VOi. «i. HO. )2_ DECEMBER »WJ 



Mobteds: Nae in* Ac arrow pa** 
f bAbkfd of Urn cfaOdien. TV lawes 
too bigbesi frcsjueacy iwbbaod is ai 
iee coMiiling of Ail of Ac de- 
HH 3. The eocmekm In Mi* 
and a£f of ils dciondaals are ioife. 



fot encoding a sifMifkancc map: 
ire childtcA. Alio note inac 4U po- 
before (he scan novo io ix« ooti 



tt > 



tmdrli i rial 

c icode 
i cnoc ling 



often increases the cost of 
To sec why this may occar. 
LSSOciated with partitioning 
'<) significant samples into 
otrees and those with fig- 
of this decision is Cbits, 
is less than C/4 bits, 
four 2erotree symbols scp- 
tyrabols. 

cost of encoding the sig- 
arily. Even though the im- 
g a decorrelating transform 
coefficients arc not inde* 
techniques employing 
b tht binary map via 
_ J30J. Unlike the zero- 
' terminating" symbol and 
length encoding requires a 
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Fig. 6. Flow Chan for evading a coefficient or the aifiuRswc map. 

See why zerotrees may provide an advantage over EOB 
symbols, consider that a zcroiree represents the insignif- 
icance information io a given orientation over an approx- 
imately square spatial area at all finer Scales op to and 
including the scale of the zerotree root. Because the wave- 
let transform is a hierarchical representation, varying the 
scale in which a zcroiree root occurs automatically adapts 
the spatial area over which insignificance is represented. 
The EOB symbol, however, always represents insignif- 
icance over the same spatial area, although the number of 
frequency bands within that spatial area varies. Given a 
fixed block size, such as 8 x 8, there is exactly one scale 
in the wavelet transform in which if a zerotree root is 
found at that scale, it corresponds to the saxac spatial area 
as a block of the OCT. If a zerotree root can be identified 
at a coarser scale, then ibe fn signifies nee pertaining to 
that Orientation Can be predicted over a larger area. Sim- 
ilarly, if the zerotree root does not occur ai ihSs scale, then 
looting for zcrouees at finer scales represents a hierar* 
ehical divide and conquer approach to searching for one 
or more smaller areas of insignificance over the spa- 
tial regions as the DCT block size. Thus, many more coef- 
ficients can be predicted in smooth areas where a root typ- 
ically occurs at a coarse scale. Furthermore, the zerotree 
approach can isolate interesting non-zero details by im- 
mediately eliminating large insignificant regions from 
cofliiderauon. 

Note that this technique is quite different from previous 
attempts to exploit self-similarity in image coding fJ9] in 
that it is far easier to predict insignificance than to predict 
significani detail across scales. The zerotree approach was 
developed in rccognilfon of the difficulty in achieving 
meaningful bit rate reductions for significant coefficients 
via additional predicrioii. Instead, the focus here is on re- 
ducing Ihe cost of encoding the significance map so that, 
for a given bit budget, more bits are available to encode 
expensive significant coefficients. In practice, a large 
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fiplt Jmag* Model 
a coefficient at a coarse scale 
to a threshold then all of its 
are also insignificant— can 
general image model. One 
lo be common tr> most models 
tfthst of a 4 'decaying spectrum. " 
exists for both stationary au- 
d non-statiooary fractal, or 
implied by the name which re- 
(33J. The model for the 
more general than "decaying 
for some deviations to ' 'de- 
it is linked lo a specific thresh- 
where the threshold is SO* Sod 
of magnitude 30, and 
has a magnitude of 40. Al- 
acscendburt has a larger mag- 
under consideration (30), 
rrn" hypothesis is violated, the 
lion Can still be represented us- 
whole tree is still insignifieani 
Thus, assuming the more com- 
oidb validity, the zerotree hy- 
easily and extremely often, 
the hypothesis Is violated, it h 
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[d expect the COS! of represcM- 
casunxce with its sclf-informa- 
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out that the improvement hi 
provided by Zerotrccs is spe* 
xpl oiling any linear dependen- 
' diftcxem scales that were not 



removed io the transform stage. In practice, the linear 
correlation between the values of parent and child wavelet 
coefficients has been found to be extremely small, imply- 
ing that the wavelet transform is doing an excellent job of 
producing nearly uncorrelated coefficients. However, 
there is likely additional dependency between the squares 
(or magnitudes) of parents and children. Experiments run 
on about 30 images of all different types, show that the 
correlation coefficient between the square of a child and 
tftc square of its parent tends to be between 0.2 and 0.6 
with a string concentration around 0.35, Although this de~ 
pendency Is difficult to characterize in general for most 
linages „ even without access to specific statistics, it is rea- 
sonable io expect the 'magnitude of a child to be smaller 
than (he magnitude of its parent. In other words, it can be 
reasonably conjectured based On experience with real- 
world images, that had we known the details of the sta- 
tistical dependencies, and computed &n "optimal" esti- 
male, such as the conditional expectation of the child's 
magnitude given the parent's magnitude, that the '"opti- 
mal" estimator would, with very high probability, predict 
thai the child's magnitude would be the smaller of the 
two. Using only this mild assumption* based on an inex- 
act statistical characterization, given a fixed threshold, and 
conditioned on the knowledge that a parent is insignificant 
with respect to the threshold, the "optimal" estimate of 
the significance of the rest of the descending wavelet tree 
is that it is entirely insignificant with respect to ihe some 
threshold, i.e., a zerotree . On the other baud, if the parent 
is significant, ihe 14 optimal" estimate of the significance 
of descendants is highly dependent on the details of the 
estimator whose knowledge would require more detailed 
jnibrrnarion about the statistical nature of the image. Thus, 
under this mQd assumption* using zerotrccs to predict the 
insignificance of wavelet coefficients at fine scales given 
the insignificance of a root at a coarse scale is more likely 
io be successful in the absence of additional information 
than attempting to predict significant detail across scales. 

This argument can be made more concrete. Let x be a 
child Of y, where x and y are zero-mean random variables, 
whose probability density functions (PDF) arc related as 

« <9>,(ax). a > 1. (6*) 

This states that random variables x and y have the same 
PDF shape, and that 

= oVJ. <7) 

Assume further that x and y are uncorrelated. i.e., 

*U>J = 0. (8) 

Note that nothing has been said about treating the sub- 
bands as a group, or as stationary random processes, only 
thai there is a similarity relationship between random 
variables of parents and children. It is also reasonable be- 
cause for intermediate subbands a coefficient thai is a child 
with respect to one coefficient is a parent with respect to 
others; the PDF of that coefficient should be the same in 
either case. Lei u = jt 2 and v = y*. Suppose that u ind 
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(10) 
01) 
02) 

(13) 
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04) 



(15) 



If it is observed that the m gnitude of rhe parent is below 
the threshold T, i.e., v ■ 
be upper bounded by 



y 1 < f* t then the BLUE can 



K ^—r^*i + *-v. 



<16) 

ey and b) T < a r In case (a). 



fiatuEtvl*' < r 2 ) s ~i < r\ 



07) 



which implies that the BLtftE of jc 1 given | y| < 7 is less 
than 7*, lor any p, includi igp = fl. In case (b), we can 
only upper bound the righ| hand side of (16) by 7* if p 
exceeds the lower bound 
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fctcnt results, bat the 
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might yield dif- 
analysis suggests that for 
deviation Of the parent, 
deviation of all de- 
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udes of all descendants is that 
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occur, and more knowledge 
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ih/sis is thai at very low bit 
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rates, where the probability of an insignificant sample 
must be high and thus* the significance threshold T must 
also be large, expecting the occurrence of zcrotrees and 
encoding significance maps using zcaemc coding is rea- 
sonable without even knowing the statistics. However, 
letting T decrease, there is some point below which the 
advantage of zero tree coding diminishes, and this point is 
dependent On the specific future of higher order depen- 
dencies between parents and children. In particular, the 
stronger this dependence, the more T can be decreased 
while still retaining an advantage using zerotree coding. 
Occ again, this argument is not intended to "prove" the 
opdmaJity of zerotree coding, only to suggest a rationale 
for its demonstrable success. 

D. 2*rotrt4-Likc Structures in Other Subband 
Configuration* 

The concept of pre dieting the insignificance of coeffi- 
cients from low frequency to high frequency information 
corresponding to the same spatial localization is a fairly 
general concept and not specific to the wavelet transform 
configuration shown in Fig. 4. Zerouee* are equally ap- 
plicable to quincunx wavelets [2], 1 13], [23 J, [29], in 
which case each parent would have two children instead 
of four, except for the lowest frequency, where parents 
nave a single child. 

Also, a similar approach can be applied to linearly 
spaced subband decompositions, such as the DCT, and to 
other more general subband ^compositions, such as 
wavelet packets 15] and Laplacian pyramids [4]. For ex- 
ample* one of many possible parent-child relationship for 
linearly spaced subbands can be seen in Fig. 7. Of course, 
wflh the use of linearly spaced subbands, zcnouce-like 
coding loses its ability to adapt the spatial extent of the 
insignificance prediction. Nevertheless, it is possible for 
zerotrec-liVe coding to outperform EOB-coding since 
more coefficients can be predicted from the subbands 
along the diagonal. Far the case of wavelet packm, the 
situation is a bit more complicated, because a wider range 
of tilings of the ^space-fjrequency** domain are possible. 
In that Case, it may not always be possible to define sim- 
ilar parent -child relationships because a high-frequency 
coefficient may in fact correspond to a larger spatial area 
than a co-located lower frequency coefficient. On the otheT 
hand, in a coding scheme such as the **besi-4ttsts a ' ap- 
proach of Coifman ct al. [5J, had (he image-dependent 
best basis resulted in such a situation, one wonders if the 
underlying hypothesis'— that magnitudes of coefficients 
tend to decay with frequency--- would be reasonable any' 
way. These zcrotfce-Iikc extensions represent interesting 
areas for further research. 



The previous section describes a method of encoding 
significance maps of wavelet coefficient! that, at least em- 
pirically, seems to consistently produce a code whh a 
lower bit race than cither the empirical first-order entropy. 
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using the ordering of the subbands shown in Fig. 5. all 
coefficients in a given subband appear on the initial dom- 
inant list prior to coefficients in the next subband. The 
Subordinate list contains the magnitudes of those coeffi- 
cients thai have been found to be significant. For each 
threshold, each list is scanned once. 

During a dominant pass, coefficients with coordinates 
on the dominant list, i.e.. those Chat have not yet been 
found to be significant, are compared to the threshold T t 
to Cfetermine their significance, and if significant, their 
sign. This significance map is then zcrotree coded using 
the method outlined in Section III. Bach time a coefficient 
is encoded as significant, (positive or negative), its mag- 
nitude is appended 10 the subordinate list, and the coeffi- 
cient in the wavelet transform array is set to zero so thai 
the significant coefficient does not prevent the occurrence 
of a zcrotree on future dominant passes at smaller thresh- 
olds. „ . 

A dominant pass is followed by a subordinate pass in 
which all coefficients on the subordinate list are scanned 
and the specifications of the magnitudes available to the 
decoder are refined to an additional bit of precision. More 
specifically, doriag a subordinate pass, the width of the 
effective quantizer step size, which defines an unccitaiiity 
interval for the true magnitude of the coeffideni, is cut in 
half. For each magnitude on Ihc subordinate list, this re- 
finement can be encoded using a binary alphabet with a 
"1*' symbol indicating thai the true value falls in the up- 
per half of the old uncertainty interval and a "0" symbol 
indicating the lower half. The string of symbols from Uris 
binary alphabet Chat is general cd during a subordinate pass 
is then entropy coded. Note thai prior to this refinement, 
the width of the uncertainly region is comedy equal to the 
current threshold. After the completion of a subordinate 
pass the magnitudes on the subordinate list arc sorted in 
decreasing magnitude, to the extent that the decoder has 
the mfoTmation to perform the same sort. 

The process continues to alternate between dominant 
passes and subordinate passes where the threshold is 
halved before each dominant pass, (la principle one could 
divide by other factors than 2. This factor of 2 was chosen 
here because it has nice interpretations in terms of bit 
plane encoding aad numerical precision hi a familiar base 
2, and good coding results were obtained). 

In the decoding operation, each decoded symbol, both 
during a dominant and a Subordinate pass, refines and re- 
duces the width of the uncertainty xrucrval in which the 
true value of the coefficient (or coefficients, in the case of 
a zerotree root) may occur. The reconstruction value used 
can be anywhere in that uncertainty interval. For mini' 
mum mean-square error distortion, one could use the cen- 
troid of the uncertainty region using some model for the 
PDF of the coefficients. However, % practical approach, 
which is used in the experiments, and is also MJNMAX 
optimal, 3s io simply use the center of the uncertainty in- 
terval as ihc reconstruction value. 

The encoding stops when some target stopping condi- 
tion is met, such as when the bit budget is exhausted- The 
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B. Relationship to BH 

Although the cmbedde* 
is considerably more 
simple bit-plane encoding 
ship with bit-plane 
cess or embedded coding. 

Consider the 
case when all thresholds a 
lei coefficients are imegei 
cient that eventually gets 
and bii position of che 
(MSBD) are measured 
pass. For example* 
the number 41 as 
digits as a sequence of 
Proceeding from left to ri 
tcrcd* "K" wc expect th 
next digit to be strongly b 
the kit and including (be 
bits* and ate measured du 
M5BD has been 
and much less biased 
u l»" although we might 
cause most PDF models 
with amplitude. Those 
MSBD are called the 
and encoded during the 
approximation suggests 
to one bit per * "binary 
dominant bits should be 

By using attccessivi 
largest possible threshold 
is extremely close to one 
whose efficiency increase 
creases, we shnnld be 
very few bits, since they 

Jn general, the 
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and mote sophisticated than 
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quantizer for the 
b powers of two, and all wave 
In this case, for each coeffi- 
coded as significant, the sign 
rnosi-significaiiz binary digit 
j encoded during a dominant 
the 10-bit representation of 
. Also, consider tbc binary 
decisions in a binary tree, 
hi. if we nave not yet encoun- 
prohabiiity distribution for the 
ised toward "0." The digits 10 
USBD arc called the dominant 
ing dominant passes. After the 
I, wc expect a more random 
a "0" and a 
still expect |»<0) > be- 
transfbrm coefficients decay 
digits to the right of the 
bits and arc measured 
subordinate pats. A zcroth-order 
wc should expect to pay dose 
for subordinate bhs, while 
less expensive. 

beginning with the 
where (he probability of aero 
and by using zerotree coding, 
as the probability of aero in- 
to code dominant bits with 
most often part or a zerotree. 
need not be powers of two. 
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However, by factoring out a constant mantissa, M f the 
starling threshold To can be expressed in terms of a thresh- 
old that is a power of two 

To « Af2*, (19) 

where the exponent JT Is an integer, in which case, the 
dominant and subordinate bits of appropriately scaled 
wavelet coefficients are coded during dominant and sub- 
ordinate passes, respectively. 

C. Advantage of Small Alphabets for Adaptive 
Arithmetic Coding 

Note that the particular encoder alphabet used by the 
arithmetic coder ai any given time contains either 2, 3, or 
4 symbols depending w nether the encoding b far ■ sub- 
ordinate pass, a dominant pass with no zerotree root sym- 
bol, or a dominant pass with the zerotree root symbol. 
This is a real advantage for adapting the arithmetic coder. 
Since there are never more than four symbols, all of the 
possibilities typically occur With a reasonably prmwaWc 
frequency. This allows an adaptation algorithm with a 
shon memory to seam quickly and constantly track ehang- • 
ing symbol probabilities. This adapuviry accounts tor 
some of the effectiveness of the overall algorithm. Con- 
trast ihisj with the case of a Urge alphabet, as is (he case 
in algorithms that do not use successive approximation. 
Jn thai easeC it lakes many events before an adaptive cav- 
Iropy coder can reliably estimate the probabilities of un- 
likely symbols (see the discussion of Che zetvk-freqaency ' 
problem in [3]). Furthermore, these estimates are fairly 
unreliable because images aw typically statistically aon- 
stadonary and local symbol probabilities change from re- 
gion to region. 

In the pesetjeal coder used in the experiments! the arith- 
metic coder is based on [31 J. In arithmetic coding, the 
encoder is separate from the model, which in pJJ, is bas- 
ically a histogram. During the dominant passes, simple 
Markov conditfoAnig Is used whereby one of four histo- 
grams is chosen depending on I) whether the previous 
coefficient in the scan is known to be significant, and 2) 
whether the parent is known to be significant. During the 
subordinate passes, a single histogram is used. Each his- 
togram entry is initialized to a count of one. After encode 
ing each symbol, die corresponding histogram entry is in* 
ciemented. When the SVm of all the conrjts in a histogram 
reaches the maximum count, each entry is incremented 
and integer divided by two, as described in [311. 1* should 
be mentioned, that for practical purposes, the coding gains 
provided by using this simple Markov ronrihirmin^ may 
net justify the added complexity and using a single his- 
togram strategy for the dominant pass performs almost as 
well (0. 12 dB worse for Lena at 0.25 bpp-)- The choice 
of maximum histogram count is probably more critical • 
since that controls the learning rate for the adaptation. For 
the experimental results presented, a maximum count of 
256 was used, which provides an intermediate tradeoff be- 
tween the smallest possible probability, which is the re- 
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processing used in EZW 
deriog of importance 
magnhiide, scale* and 
the initial dominant list. 

The primary 
the numerical precision 
seen Id the (act that the u 
nitudc of all coefficients 
before the uncertainty ir 
fined further. 

The second factor io 
magnitude. Importance b; 
in£ a dominant pass 
cienls axe Insignificant 
they arc found to be sign 
have the Same magnitude 
niludes of those coeflicici 
penance by magnitude 
nate pass by the fact 
descending order of ihe 
vals, i.e., the decoder's 
The third factor, seal* 
ordering of the subbands 
the significance of the 
covered during a 
scales are tested 
scales. This is consistent 
er** Version of magnitude 
found to be significant, 
zero. 

The final factor, spatta 
two coefficients thai 
decoder in terms of either 
have their relative 
the initial scanning order 
two coefficients. 

In one sense, this 
increasing Operational 
tonion metric defined to 
uncertainty interval* of 
Since a discrete wavelei 
scnurion of an image, a di 
wavdet transform domain 
fined 00 the image. This 
without a rational 
where noticeable artifacts 
tual metric* hased on just- 
do not always predict whic 
prefer. Since minimizing t 
vals minimizes the largest 
result from numerical erra: 
ceptiblc thresholds, are 
toition function, the 
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a subjective term, the order of 
implicitly defines a precise or- 
is tied to, in older, precision* 
sjjfuial location as determined by 



of ordering importance is 
if the coefficients. This can be 
■certainty imervals for the mag- 
rc refined to the same precision 
erval for any coefficient is re- 
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l for signifi ance 
* ithi 
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count, and the learning rate, 
maximum histogram count, 



determination of importance is 
magnitude manifests itself dur- 
prier to the pass, all coefil- 
p resumed to be zero. When 
ncant, they are all assumed to 
which is greater than the mag- 
s that remain insignificant. Im~ 
"eats itself daring a subordt- 
al magnitudes em refined in 
enter of the uncertainty inter- 
i. tetpretation of the magnitude, 
manifests itself in the a priori 
the inftia) dominant list. Until 
of a coefficient is dis- 
paas. coefficients in coarse 
before coefficients in fine 
prioritization by the decod- 
ince for all coefficients net yet 
magnitude is presumed to be 



location > merely implies that 
t yet be distinguished by the 
trecbion, raagnkttde, ox scale, 
nee determined arbitrarily by 
>f the subband containing the 



embed Ling strategy has a slricily rran- 
" di5t|rtior>-rate function for the dis- 
the sum of the widths of the 
of the wavelet coefficients, 
ujlnsforra is an mvenible repre- 
torxlon function defined in the 
also a distortion function de* 
istortion function is also not 
for low-bit rate coding, 
mat be tolerated, and perccp- 
oliceable differences <JND"s) 
artifacts human viewers will 
c widths of uncertainty intcr- 
r. xssiblc errors, artifacts, which 
large enough to exceed per- 
Even using this dis- 
em bedding strategy is not 



mi limizcd. 



optimal, becaus e truncation of the bit stream in the middle 
of a pass causes some uncertainty intervals to be twice as 
targe as others. 

Actually, as h has been described thus far, EZW is un- 
likely to be optimal foe any enstonipn function. Notice 
that in (19), dividing the thresholds by two simply dec- 
rements £ leaving M unchanged. While there must exist 
an optimal starting Jtf which minimizes a given distortion 
function, how to find this optimum is still an open ques- 
tion and seems highly image dependent. Without know!-, 
edge of the optimal M and being forced to choose it based 
on some other consideration, with probabUiry one. either 
increasing or decreasing if would have produced an 
embedded code which has a lower distortion fox the same 
rate. Despite the fact that without trial and error optimi- 
zation for Af. EZW is probably suboprimai, it js never- 
theless quite effective in practice. 

Note also that using the width of the uncertainty inter* 
val as a distance metric is exactly the same metric used in 
finite-precision fixed-point approximations of real num- 
bers. Thus, the embedded code Can be seen as an "im- 
age" generalization of finite-precision fixed-point ap- 
proximation* of real numbers. 

E. Relationship to Prioriry-Posilion Coding 

In a technique based on a very similar philosophy, 
Huang et al. discusses a related approach to embedding, 
or ordering the information in importance, colled prktriry- 
posiiion coding (WQ [10]. They prove very dcgantly 
that the entropy of a source is equal to the average entropy 
of a particular ordering of that source plus the average 
entropy of the position information necessary to recon- 
suua the source. Applying a sequence of decreasing 
thresholds, they attempt to sort by amplitude all of the 
DCT coefficients for the entire image based on a pajtitioa 
of the range of amplitudes. Far escn coding pans, they 
transmit the sigmficancc map which Is arithmetically en- 
coded. Additionally, when a significant ccefficieat is 
found they transmit its value to Us full precision. Like the 
EZW algorithm, FPC implicitly defines importance wnh 
respect to the magmmdes of the transform coefficients. In 
one sense. PfC is a generalization of the successive-ap- 
proximation method presented in this paper, because FPC 
allows more general partitions of the amplitude range of 
the transform exxtf cients. On the other hand, since ppc 
sends me value of a significant coefficient to full preci- 
sion, its protocol assigns a greater importance to the least 
significant bit of a significant coefficient than to the iden- 
tification of new significant coefficients on next PPC pass* 
In contrast, as a lop prioriry. EZW tries to reduce the 
width of the largest uncertainty interval in ail ""**Wen» 
before increasing the precision further. Additionally, PPC 
makes no attempt to predict insignificance from low fre- 
quency to high frequency,, relying solely on the arithmetic 
coding to e nco d e the significance map. Also unlike EZW, 
the probability estimates needed for the arithmetic coder 
were derived via training on an image database instead of 
adapting to the image itself. It would be interesting to 
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coding gain. 
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of uncertainty intervals » 
_ using small alphabets) with 
partitioning the range of am- 
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ttd there Is certainly a much 
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In this section, a simple 
light the Older of operation 
Only the string of symbols 
terested in the details 
ferred to [31]. Consider tbj 
form of an 8X8 image. 
Fig. 8. Sine* the largest 
can choose our initial 
631. Let T Q m 32. Table 
first dominant pass. The fo 
ble I: 

1) The coefficient has 
than ihe threshold 32, and 
is generated. After decod 
knows the coefficient hi the 
13 4*. 

2) Even though the 
respect to the threshold 32 
two generations down in 
47. Thus, the symbol for 4 

3) The inagnitiide 23 u 
dams which include (3, — 
all coefficients in subband 
tree symbol is generated, 
for any coefficient in 
current dominant pass, 

4) The magnitude 10 i; 
darns (-12, 7. 6, -1) also 
Thus a £eroircc symbol is 
has a violation of the 
since a coefficient (-12) in 
greater than its parent (10 
has magnitude less loan 
zerQtrcc- 

5) The magnitude 14 is 
Its children are (-1, 47, -2 
nitude 47 U significant, a 
crated. 

6) Note that no symbol. 
HHl which would oxdinar 
scan. Also oOte that sinct 
dams, the entropy coding 
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ie array of values is shown in 
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shows the processing on the 
lowing comments refer to Ta- 
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positive so a positive symbol 
ng this lycnboi, the decoder 
interval (32, 64) whose center 

coeflcient 31 fa insignificant with 
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ubbaad LHl with magnitude 
\ isolated zero is generated, 
less than 32 and all descen- 
ts -14, 8) in subband HHl and 
1H\ are insignificant. A ZetO- 
a d no symbol will be generated 
subfc nds HHZ and H»I during the 



leas than 32 and all de 
Lave magnitudes less than 32. 
generated. Notice that this tree 
d caying spectrum** hypothesis 
subband HL\ has a magnitude 
Nevertheless, the entire tree 
t ic threshold 32 so it Is Still a 

nsignincani with respect to 32. 
Z). Since Us child with mag- 
isolated zeto symbol is gen- 
were generated from subband 
ly precede subband HL\ in the 
subband HIA has no dese en- 
can resume using a 3-symbol 
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alphabet where, the 12 and ZTR symbols are merged into 
ibe 2 (zero) symbol. 

7) The magnitude 47 insignificant with respect to 32. 
Note that for the furore daarinani passes, this poeition vfll 
be replaced with the value 0. so that for the next oooiznam 
pass at threshold 16. ihe patent of this coefficient, which 
has magnitude 14, can be coded using a zerolxee root 
symbol. 

t>uring the first dominant pass, which used a threshold 
of 32. lour significant coefficients were identified. These 
coefficients will be refined during the first subordinate 
pass. Prior to the first subordinate pas*. t»e uncertainty 
interval for the magnitudes of all of the significant coef- 
ficients is the interval [32. 64). The first subordinate pass 
will refine these magnitudes and identify them as being 
either id interval [32, 46), which will be encoded with the 
symbol "0,'* or in the interval [48, 64), which win be 
encoded with the symbol "1." Thus, the decision bound- 
ary is the magnitude 48. h is no coincidence that these 
symbols are exactly the first bit to the right of the M5BD 
in the binary representation of the magnitudes. The omer 
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ahead of 34 because from 
recocs Unction values 56 
ever, the magnitude 34 
because a* far as the 
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parlance by scale, has 34 
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■t the new threshold of H 
coefficients not yet found 
Additionally, ihosc coeffe 
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the uncertainty interval [32, 
i ordered for /unite subordinate 
M, 47). Note that 49 is moved 
decoder's point of view, the 
40 arc distinguishable. How- 
ahead of magnitude 47 
can tell, both have magni- 
whicfa is based first on im- 
rior lo 47. 

i to the second dona bant pass 
During this pas*,, only those 
!0 be significant are scanned, 
tents previously found to be 
for the purpose of determbv 
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fficiem -31 in subband LH3 
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three Coefficients in subband 
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LH1 and all four coeffi- 
be second dominani pass ter- 
all other coefficients are pre- 
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The processing continues alternating be tw e en dominant 
and subordinate passes and can stop at any time. 

VI. ExrE&lM&rrAL Results 

All experiments were performed by encoding and de- 
coding an actual bit stream to verify the correctness of the 
algorithm. After a J2-byte header, the entire bit stream is 
arithmetically encoded using a single arithmetic cedex 
with an adaptive model pi]. The model 1b initialized a! 
each new threshold for each of the domiuant and subor- 
dinate passes. From that point, the encoder is folly adap- 
tive. Note in particular that there is no training of any 
kind, and no ensemble statistics of images arc used in any 
way (unless one calls the 2erotree hypothesis an ensemble 
statistic). The 12-byte header contains 1) the number of 
wavelet scales, 2) the dimension* of the image, 3) the 
maximum histogram count for the models in the arith- 
metic coder. 4) the image mean and 5) the initial thresh- 
old. Note that after the header, there is no overhead ex- 
cept for an extra symbol for end-of-bit-strcam, which is 
always maintained at minimum probability. This axon 
symbol is not needed for storage on computer medium if 
the end of a file can be detected. 

The EZW coder was applied to the standard black and 
white B bpp. test images, 512 X 512 "Lena*' and the 5 12 
x 512 "Barbara," which are shown in Figs. 9(a) and 
11(a). Coding results for "Lena" are su mm arized m Ta- 
ble PI and Fig. 9. Six scales of the QMF-pynmid were 
used. Similar results are shown for "Barbara" in Table 
IV and Fig. 10. Additional results for Che 256 X 256 
"Lena" axe given in [22]. 

Quotes of PSNR for the 512 X 512 "Lena" image are 
so abundant throughout the image coding literature that k 
is difficult to definitively compare these results with other 
coding results. 1 However, a literature search has only 
found two published results where authors generate an ac- 
tual bh stream that claims higher PSNR pe rfo r ma nce at 
rates between 0.25 and 1 bit/pixel [12] and [21], the lat- 
ter of which is a variation of the EZW algorithm. For the 
"Barbara** image, which is fir more dimwit than 
' 'Lena,' ' the performance using EZW is substantially bet- 
ter, at least numerically, than the 27.82 dB for 0.534 bpp. 
reported in [28]. 

The performance of the EZW coder was also compared 
to a widely available version of JPEG [14]. JPEG does 
not allow the user to select a target bh rate but instead 
allows the user to choose a "Quality Factor." In the ex- 
periments shown in Fig. 11, "Barbara 7 " is encoded first 
using JPEG to a file sUe of 12 866 bytes, or a hit r ate o f 
0.39 bpp. The PSNR in this case is 26.99 dB. The EZW 
encoder was then applied to "Barbara*' with a target file 



'Aetully there an maluple <*cruooc of the hnni«wrw ooly "La*" 

(lottiflf anauuf* sad the one used in VO\ is duver and itigbuy atom dEfi- 
Cell ih* "oJfttV tnwz obn'nctf bp ihia awtiwr Aocn RP1 *ftcr QZI 
ptfbGibftd. Aho sole U»i Ifais sboald ooi be enxtued wilt results unax 
our; Hie trees component of an HOB venaoo *Akta an abo commily 
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sixe of exactly 12 866 
d&, Significantly higher 
waft then applied to 
obtain exactly the same 
size U S&20 bytes, or 0, 
EZW version looks betie, 
. While there U some 



F>|. 9. Pttfbffltuci of EZW Coder upc«u«t on "LeM." (■) Origin! 
512 x 511 "Lew" Imr nl biu/p«d <b> I.OUti/pucL &; » Cenpn*- 
tirm. JPSNft » 59.53 rfH- <«) 0.3 bu*/p«rl 16: 1 Cbrajreaiow. PSNR - 
36.2*. (tf| 0.29 bui/f(id, 32: 1 Con^mriiXt. F5NR ■* 33- da. <ej 
O.Q62J D«*/pixel. 121: 1 CWprwion. KNU = «». (O 0.015633 
bta/piftct. 512! I CwftCMxion, PSNH » 23.63 tfD. 



. The resulting PSNR is 29.39 
thin for JPEO, The EZW encoder 
Ba bara" using a target PSNR to 
P^NR of 26.99. The resulting file 
' bpp. Visually* the 0.39 bpp. 
than the 0.39 bpp. JPEO ver- 
toss of resolution in both, there 



arc noticeable blocking Artifact* in the JPEG veision. For 
the companion at the umc PSNR. one could probably 
argue in favor of the JPEG. 

Another interesting figure of mcrii Is the number of sig- 
nificant coefficients retained. DcVore « al. used wavelet 
transform coding to progressively encode the same image 
1 8). Using 68 272 bits. (8334 bytes, 0.26 bpp.), they re- 
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tained 20)9 coefficients anc 
15.30 <M5E>= 234. 24.42 
Hat coding scheme. 9774 „ 
only 8192 bytes. The PSNR 
feo by over 8 <f&. Pan of iht 
to fact that the Hair basis was 
examination shows that the 
much better way of encoding 
cani coefficients than was 
An Interesting and , 
embedded coding if that 
is terminated during the midd 9 
of the scanning or a subtend 
duced ihaj would Indicate 

In other words* some 

represented w fch twice the 
sible explanation of this p. 
there are so few significant 
not nuke a perceptible dtf 
» a dominant pass, telling sofie 
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schieved a RMS error or 
whereas using iht embed- 
urients are retained, using 
For these two examples dif- 
diffcrence can be attributed 
Iscd in [8j. However, closer 
Eerotree coding provides a 
the positions or the signifi- 
" in [83- 

surprising property of 
the encoding or decoding 
of i pass, or In the middle 
there arc no artifacts pro- 
the termination occQrs. 
in the same sobbnnd are 
of the others. A pos- 
is that At low rates, 
thai any one does 
Thus, if the last pass 
coefficient that might be 



CO rodents 



use I 
pcitu pa 
wht i 



wf en 
coefficu us 
pre :isioa 
phez nucua 
c* sfficients 
differ nee. 



significant to aero may be imperceptible. Similarly, the 
fact that some have more precision than others is also im- 
perceptible. By the time the number of significant coeffi- 
cients becomes large, ihc picture quality is usually so good 
thai adjacent coefficients with different precisians are im- 
perceptible. 

Another interesting property of the embedded coding is 
that because of the implicit global bit allocation, even at 
extremely high compression ratios., the performance 
scales. At a compression miio of 5 12 : 1 1 the image qual- 
ity °f "Lena'* is poor, but still recognizable. This is not 
the case with conventional block ending schemes, where 
01 S<iCh high compression ratios, there would be insuffi- 
cient bits to even encode the DC coefficients of each 
block. 

The unavoidable artifacts produced at low bit rates us* 
ing this method are typical of waveJci coding schemes 
coded to i he same PSNR'i, However, subjectively, they 
arc not nearly as objectionable as the blocking effects typ- 
ical of block transform coding schemes. 
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ft*. IC. PwfanniifiK or EZW Coder 0|TOti»$ Ofl "B«hrt" u t>) 1-0 
bi» /pixel, ft: ) CorapreMton, PSNR - 53. 14 dfi <fa) 0.5 Us/ip&cK IS 1 1 
Cftmorento* P5NF - *&. CC) 0-115 bhi/puul. 64:1 Compro- 

ikM. PSNft 24.M dB. (tf) 0.0623 b*s/p1«rl- 13* 1 1 Co«p«*ia», HNR 
pr 23.10 dB, 



A new technique for 
that produces a fully 
the compression perform 
peUtjve with virtually nil 
able p e i fu i m ancc can be 
lowing four features: 



VII, C >NCLUSJON 

tm|gc coding has been presented 
bit stream. Furthermore, 
of this, algarilbrn is cOm- 
icchniques. The remark- 
t tributed to the ute of the fol- 



embc ided 



k lown I 



a discrete wavelet transform, which decorrcbtes 
rnost sources fairly well, and allows the more signif- 
icant bits of precision of most coefficient! to be ef- 
ficiently encoded as part of exponentially growing 
zerotrees, 

zcrotree coding, which by predicting insignificance 
across scales using an image model that is easy for' 
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mosrt images id 
gains over the first- 
map* 
successive 



ve-approxinu :ion 
of multiple signifies! Ce 
allows ibe encoding 
adaptive arithmetic ctftiing 



iul 31? x jl3 <h) (£2W « |? I* bytes. 0-» Uu/pbul, 19 J» dB. (c) 
EJW M WO 0.27 *iti/l*>c<. 2*.99 dB. f*> JFEC * I* H« 
0.39 bill/pud. 26l99 dB. 



, provide! substantial coding 
rdcr entropy for significance 



which allows the coding 
maps using 2crorrees ? and 
decoding to stop at any point. 
. which allows the cniropy 



coder to incorporate learning into the bit stream it- 
self. 

The precise rate control that is achieved with this al* 
gOrixhm js a distinct advamage. The user cas choose a bh 
rale and encode the image to exactly the desired bii rate. 
Furthermore, since no mining of any kind is required. 
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ihe algorithm is fairly Scleral and performs remarkably 
well with most types of ir ages. 
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